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This paper describes the design, construc- 
tion and characterization of two DNA 
molecules that function as prototype Stan- 
dard Reference Materials for use in the 
determination of DNA sequences. The first 
prototype reference DNA molecule was 
constructed to function as a true reference 
DNA sequence encoding the human mi- 
tochondrial hypervariable region 1. The 
second DNA molecule was designed to 
be a molecular DNA "tag" for use in the 
identification of DNA Standard Refer- 
ence Materials. DNA molecules were 
chemically and enzymatically synthe- 
sized, cloned in molecular cloning vectors 
and sequenced using manual and auto- 
mated fluorescent DNA sequencing meth- 
ods. The sequence of the cloned human 
mitochondrial HV1 DNA shows six base 



pair differences from the original pub- 
lished DNA sequence. The molecular DNA 
tag sequence uses the universal genetic 
code with one letter amino acid abbrevia- 
tions to spell the words "NIST DNA 
STANDARD REFERENCE MATERIAL." 
The need and utility of an expanded 
molecular code to uniquely identify DNA 
molecules is discussed. The use of these 
DNA molecules in the development of 
DNA Standard Reference Materials is 
also described. 
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1. Introduction 

The second half of this century has seen a remarkable 
scientific revolution in our capacity to elucidate, inter- 
pret and manipulate genetic information. This revolution 
was created by modern molecular biology and results 
from our ability to efficiently manipulate, purify, syn- 
thesize, sequence, and express DNA. These techniques 
provide the foundation for the modern multi-billion dol- 
lar biotechnology industry. DNA sequence determina- 
tion of genes and genomes provides unprecedented ac- 
cess to the genetic structure of deadly viruses as well as 
life saving gene products. 



The principle method used today for DNA sequence 
determination is called the "dideoxy" method and was 
developed by Fred Sanger and Alan Coulson in the 
1970's [1]. Most of the DNA sequences determined over 
the past 1 5 years have relied on the manual reading of an 
autoradiographic image of size fractionated radioac- 
tively labeled DNA sequencing reaction products and 
has required considerable skill to accurately deduce and 
assemble DNA sequences. 

The development and commercialization of auto- 
mated DNA sequencing instruments [2-7] has resulted 
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in the wide spread use of "automated" DNA sequence 
determination hardware and software. The computer 
software "calling" the DNA sequence from data gener- 
ated by the instruments removes a large part of the 
human interpretation of data. The quality and accuracy 
associated with the automated DNA sequence determi- 
nation has been demonstrated in many cases to be as 
good or better than manual methods. 

DNA sequence quality is quite variable and is influ- 
enced by many factors that contribute to the overall 
process of DNA sequence determination. These factors 
include "wetware," hardware, and software. The wet- 
ware consists of the chemistry, enzymology, and molec- 
ular biology that produce the DNA substrates and DNA 
sequencing reaction products. An instrument's hardware 
resolves and detects the reaction products and the soft- 
ware "calls" the DNA sequence from the complex col- 
lection of peaks produced by the hardware. High quality, 
well designed DNA molecules of precise sequence 
would be useful in optimizing and characterizing instru- 
ment performance as well as providing internal stan- 
dards that can be used to measure and document DNA 
sequence quality. 

DNA based standards have been developed over the 
past two decades to meet a wide variety of needs includ- 
ing molecular weight estimations of DNA, RNA and 
protein molecules [18]. The restriction fragment length 
polymorphism standard reference material (NIST SRM 
2390) is the first NIST certified DNA standard. The 
second NIST certified DNA standard for PCR based 
human identification has recently been released (NIST 
SRM 2391). A number of new standards for DNA se- 
quence determination are in development and prototype 
molecules have been assembled and characterized. Two 
examples of the prototype DNA sequencing standards 
are described in this paper. 

Identification of biological samples and individuals 
has been revolutionized by the analysis of DNA se- 
quences. Analysis of genomic DNA sequence has 
changed science and in particular medicine and forensic 
science in fundamental ways. When genomic DNA is 
limited or degraded by circumstance, the mitochondrial 
DNA is frequently analyzed for the purpose of human 
identification. 

The human mitochondrial genome sequence has been 
determined [8] to consist of 16 569 base pairs of DNA 
and has a highly variable region called the "D loop" of 
approximately 680 base pairs (bp). The 400 bp hyper- 
variable one (HV1) sub-sequence of the D loop has been 
used for human identification purposes. We have con- 
structed and characterized a number of DNA clones of 
the D loop region. One of the clones containing the HV1 
sequence from coordinates 15 975 to 16 420 [8] has 
been designated JT3 and is the first prototype DNA 



sequencing standard for the human mitochondrial HV1 
region. 

Ideally, NIST DNA standards should be chemically 
well defined, straightforward to construct and easily 
identified as molecules traceable to NIST. DNA se- 
quencing standards represent an interesting example of 
self replicating molecules that can be easily propagated 
ad infinitum. This raises a number of interesting ques- 
tions. One question is that of molecular identification, 
i.e., once the molecule is out of a labeled tube or bottle, 
how can it be identified? To address the question of 
identifying molecules at the molecular level and to 
provide one solution to this problem, a unique DNA 
sequence was designed that could be covalently inte- 
grated into any NIST DNA molecule to facilitate the 
identification of NIST DNA standards. A 108 base pair 
double stranded DNA molecule was synthesized that can 
be translated in one reading frame using the universal 
genetic code with the one letter amino acid abbreviation 
to 4 'spell out' ' NIST *DNA ^STANDARD 
PREFERENCE ^MATERIAL. The * symbol represents 
a punctuation that is encoded by translation stop codons. 
This DNA sequence functions as a "molecular tag" to 
uniquely identify this DNA molecule and molecules 
derived from this parent molecule. 

Nature has evolved the genetic code to specify twenty 
amino acids. This code with one letter amino acid ab- 
breviations, allowed the construction of the molecular 
tag described in the preceding paragraph. It is useful 
and functional to uniquely identify NIST DNA materi- 
als. Because it contains only 20 letters, it cannot be used 
to adequately construct all molecular tag sequences that 
one might need. A description of alternative DNA en- 
coded tags is presented in the discussion. 



2. Materials and Methods 

Enzymes and chemicals. Restriction endonucleases, 
bacteriophage T4 DNA ligase, DNA polymerase, and 
polynucleotide kinase were purchased from New 
England Biolabs (Beverly, MA) 1 . Calf intestinal 
alkaline phosphatase was purchased from Boehringer- 
Mannhiem (Indianapolis, IN). Isopropyl-beta-D-thio- 
galactopyranoside, 5-bromo-4-chloro-3-indolyl-beta-D- 
galactoside, acrylamide, bis aery lamide, and phenol 
were obtained from Life Technologies Inc. (Gaithers- 
burg, MD). Deoxynucleoside triphosphates and 
dideoxynucleoside triphosphates were purchased from 



1 Certain commercial equipment, instruments, or materials are identi- 
fied in this paper to foster understanding. Such identification does not 
imply recommendation or endorsement by the National Institute of 
Standards and Technology, nor does it imply that the materials or 
equipment identified are necessarily the best available for the purpose. 
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Pharmacia LKB Biotechnology. 35 S dATP was pur- 
chased from Du Pont-New England Nuclear. 

Bacterial Hosts and Plasmids. Escherichia coli TGI 
was obtained from Dr. T. Gibson, Laboratory of Molec- 
ular Biology, Medical Research Council (Cambridge, 
England). M13mpl8 (9) was obtained from New Eng- 
land Biolabs. 

DNA procedures. Single-stranded bacteriophage 
Ml 3 DNA templates were isolated from the culture 
supernatants of TGI infected with Ml 3 phage grown in 
Luria-Bertani (LB) medium [10] . The replicative form 
of Ml 3 DNA was purified as described by Messing 
[11]. Ligation of DNA fragments was performed as de- 
scribed by Maniatis et al. [12]. Human DNA was iso- 
lated from mouth scrapings with bucal swabs and used 
for isolation of mitochondrial HV1 sequences. PCR us- 
ing synthetic primers (described below) was used for 
amplification of the HV1 region to produce a DNA 
fragment of 445 bp. This fragment was cloned into the 
Smal site of M13mpl8 in host TGI. 

DNA Synthesis and Purification. Oligonucleotides 
were synthesized with an Applied Biosystems model 
380B DNA synthesizer. 5' dimethoxytrityl bearing 
oligonucleotides were purified by reverse-phase HPLC 
on a Hamilton PRP-1 column (P/N 79426) using an 
increasing linear gradient of acetonitrile in 0.1 mol/L 
triethylammonium acetate, pH 7.0. The solutions con- 
taining HPLC purified oligonucleotides were concen- 
trated to dryness in a vacuum centrifuge, dissolved in 
200 julL of HPLC grade water, transferred to 1.7 mL 
microfuge tubes and again concentrated to dryness in a 
vacuum centrifuge. 5' dimethoxytrityl groups were re- 
moved by dissolving the oligonucleotides in 30 julL of 
volume fraction of 80 % acetic acid in water and incu- 
bated at room temperature for 10 minutes. To remove 
the dimethoxytrityl alcohol, organic impurities and to 
convert the DNA to a sodium-salt form, oligonucle- 
otides were ethanol precipitated by the addition of 5 julL 
of 3 mol/L sodium acetate, 100 julL of 100 % ethanol 
and incubated at °C for 5 min and recovered by cen- 
trifugation in a microcentrifuge. Oligonucleotide pellets 
were washed with 500 julL of ice cold mass fraction of 
80 % ethanol and trace amounts of ethanol removed in 
a vacuum centrifuge. The final oligonucleotide products 
were dissolved in HPLC grade water and quantitated by 
optical density measurement at 260 nm. 

Oligonucleotide sequences and assembly. Five oligo- 
nucleotides labeled NIST 1-5 with the sequences shown 
in Fig. 3 were synthesized and used to construct a 108 
base pair double stranded DNA molecule encoding the 
NIST molecular tag. The purified oligonucleotides 
NIST 1 -5 were phosphorylated with adenosine triphos- 
phate (ATP) and polynucleotide kinase, annealed, lig- 
ated with bacteriophage M13mpl8 Smal treated recipi- 



ent vector DNA [2], transformed into Escherichia coli 
strain TG-1, and plated on LB agar plates. Ml 3 plaques 
were picked and phage DNA isolated as described [10]. 
Two oligonucleotides, (L strand: 5"-TTAACTCCAC- 
CATTAGCACC-3', H strand: 5'-CACGGAGGATG- 
GTGGTCAAG-3') were synthesized, purified and used 
for amplification of the HV1 sequence. 

DNA sequence determination. DNA clones were se- 
quenced using the Ml 3 universal sequencing primer 
KMG2 (5'CACGACGTTGTAAAACGACGGCCAG3'). 
The DNA molecular tag sequence was determined man- 
ually using 35 S dATP and Sequenase (United States Bio- 
chemicals, Cleveland, OH) by the dideoxy method of 
Sanger et al. [1] as modified by Biggin et al. [13]. 
Fluorescent DNA sequencing was done using an Ap- 
plied Biosystems Division of Perkin-Elmer (ABD) (Fos- 
ter City, CA) model 373. 

The ABD DNA sequencing system uses four different 
fluorescent dyes to identify each base specific chain 
termination reaction. Each of the four bases is synthe- 
sized with a specific colored dye that emit at different 
wavelength maxima when excited by the argon laser 
light source. The four dideoxy reactions are pooled into 
a single sample and run together in one lane on a poly- 
acrylamide gel. The fluorescent emission signature of 
each molecule as it passes a fixed point 24 cm from the 
origin is detected and recorded by an Apple Quadra 650 
microcomputer using ABD software. Data analysis is 
carried out by the Applied Biosystems Analysis program 
and produces an "electropherogram" with the pre- 
dicted DNA sequence above the data peak profiles. 



3. Results 

3.1 DNA Molecules With Human HV1 Sequences 

Figure 1 presents an "electropherogram" of a cloned 
DNA fragment encoding human HV1 sequences pro- 
duced by an Applied Biosystems Model 373 fluorescent 
based DNA sequencer. An electropherogram is an im- 
age constructed by the computer software that displays 
traces of peaks using four different colors each repre- 
senting one of the four different dyes using in the DNA 
sequencing reaction. The peaks produced at the begin- 
ning of the electropherogram correspond to the smaller 
DNA fragments that reach the instrument's detector 
first. Subsequent peaks correspond to DNA fragments 
of increasing length. The DNA template used to gener- 
ate this data is an M13mpl8 vector containing DNA 
sequences from the human mitochondrial HV1 region 
(from nucleotide 15975 to nucleotide 16420 [8] cloned 
into the Smal site of the vector. The orientation of the 
HV1 insert relative to the polylinker sites of the 
M13mpl8 vector is: 5' EcoRl-Smal-HVl nucleotide 
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15975-nucleotide 16420-SmaI-HindIII-3\ This means 
the L strand sequence [8] corresponds to "top" strand 
of the Ml 3 vector relative to the Plac promoter upstream 
of the Ml 3 polylinker site. This DNA molecule has been 
designated JT3. The DNA sequence shown in Fig. 1 is 
the H strand sequence and was sequenced with the 
KMG2 universal primer using dye labeled dideoxy ter- 
minator chemistry with the Taq polymerase in a cycle 
sequencing reaction [17]. This sequence (091594 15) is 
also presented in figure 2, line 2. 

The HV1 PCR product was sequenced directly using 
the H strand primer with dye terminator chemistry. This 
sequence (1 12594 02) is presented in Fig. 2 on line 1. In 
addition, the DNA sequence of a clone of the HV1 PCR 
product in the opposite orientation of the JT3 clone was 
is determined. This sequence (090794 21) is presented 
in Fig. 2, line 3. These three sequences were used to 



generate a composite sequence of the HV1 shown in 
Fig. 2, line 5. The composite was compared with the 
Anderson DNA sequence (Mt. hum. 1580 in Fig. 2, line 
4) [8] and showed six differences. The six differences 
are marked with solid dots below the composite se- 
quence. 

Inspection of the data suggests that these differences 
are real and are representative of natural differences 
among individuals in this sequence. The differences 
were either A for G or G for A substitutions and are 
denoted with solid dots in Fig. 2. No insertions or dele- 
tions were observed. In addition, the differences were 
found in all three DNA sequences analyzed; the cloned 
HV1 PCR fragment in JT3, the HV1 PCR fragment 
cloned in the opposite orientation in M13 and the HV1 
PCR fragment. 



4gg) 1 12594 02 CACGGAGGATGGTGGTCAAGGGACCCCTATCTGAGGGGGGTCATCCATGGGGACGAGGAGGGATTTGACTGTAATGTNCAATGW 

<jg) 091594 15 TGATTTCACGGAGGATGGTGGTCAAGGGACCCCTATCTGAGGGGGGTCATCCATGGGGACGAGGAGGGATTTGACTGTAATGTGCTATGT 
^g 090794 21 GGTCAA:GGACCCCTATCTGAGGGGGGTCATCCATGGGGACGAG:AGGGATTTGACTGTAATGTGCTATGT 

4@ MthumlBfin. TGATTTCACGGAGGATGGTGGTCAAGGGACCCCTATCTGAGGGGGGTCATCCAT G GGGACGAGAAGGGATTTGACTGT A ATGTGCTA TGT 
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TGATTTCACGGAGGATGGTGGTCAAGGGACCCCTATCTGAGGGGGGTCATCCATGGGGACGAGGAGGGATTTGACTGTAATGTGCTATGT 



4§!) 1 12594 02 CCGGTAAATGGCTTAATGTCCWATGTACTGTTAAGGGTGGGKAGGTTTGTTGGTAWCCTAGTGGGTGAGGGGTGGCTTTGAAGTTGCAGT 
<@ 091594 15 ACGGTAAATGGCTTTATGTGCTATGTACTGTTAAGGGTGGGTAGGTTTGTTGGTATCCTAGTGGGTGAGGGGTGGCTTTGAAGTTGCAGT 
lg 090794 21 ACGGTAAATGGCTTTATGTGCTATGTACTGTTAAGGGTGGGTAGGTTTGTTGGTATCCTAGTGGGTGAGGGGTGGCTTTGAAGTTGCAGT 
-(g) Mthum ififin.„ ACGGTAAATGGCTTTATGTACTATGTACTGTTAAGGGTGGGTAGGTTTGTT G GTATCCTAGTGGGT G AGGGGTGG CTTTgGAGTTGCAGT 

| 130 1140 1150 1160 1170 I 
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I 110 
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ACGGTAAATGGCTTTATGTGCTATGTACTGTTAAGGGTGGGTAGGTTTGTTGGTATCCTAGTGGGTGAGGGGTGGCTTTGAAGTTGCAGT 



-(g 112594 02 
<g 091594 15 
<g 090794 21 
4£H Mt.hum. 1580., 



-(g 112594 02 
^g 091594 15 
<g 090794 21 
-(g) Mt.hum. 1580., 



-(g 112594 02 
<g 091594 15 
<g| 090794 21 
-1^1) Mt.hum. 1580. 



TGATGTGTGATAGTTGAGGGTTGATTGCTGTACTTGCTTGTAAGCATGGGGAGGGGGTTTTGATGTAGGTTGGGTTTTTATGTACTACAG 
TGATGTGTGATAGTTGAGGGTTGATTGCTGTACTTGCTTGTAAGCATGGGGAGGGGGTTTTGATGTAGGTTGGGTTTTTATGTACTACAG 
TGATGTGTGATAGTTGAGGGTTGATTNCTGTACTTGCTTGTAAGCATGGGGAGGGGGTTTTGATGTAGGTTGGGTTTTTATGTACTACAG 
.TqATQTGTGATAgTTGAGGgTTGATT GCT GTACTTG CT TgTAAgC A TggggAg gGg GTTTTgATgTggATTGGGTTTTTATgTACTACAG 

180 1190 1200 1210 1220 1230 1240 1250 1260 I 

TGATGTGTGATAGTTGAGGGTTGATTGCTGTACTTGCTTGTAAGCATGGGGAGGGGGTTTTGATGTAGGTTGGGTTTTTATGTACTACAG 



..«►...♦. 



GTGGTCAAGTATTTATGGTACCGTACAATATTCATGGTGGCTGGCAGTAATGTACGAGATACATAGCGGTTGTTGATGGGTGAGTCAATA 
GTGGTCAAGTATTTATGGTACCGNACAATATTCATGGTGGCTGGCAGTAATGTACGAGATACATAGCGGNTGTTGATGGGTGAATCAATA 
GTGNTCAAGTATTTATGGTACCGTACAATATTCATGGTGNCTGGCAGTAATGTACGAGATACATAGCGGTTGTTGATGGGTGAGTCAATA 

270 1280 1290 1300 1310 1320 1330 1340 1350 I 
GTGGTCAAGTATTTATGGTACCGTACAATATTCATGGTGGCTGGCAGTAATGTACGAGATACATAGCGGTTGTTGATGGGTGAGTCAATA 



CTTGGGTGGTACCCAAATCTGCTTCCCCATGAAAGAACAGAGAATAGTTTAAAWTAGWATCTTASCNTTGGGTGCTAATGGTGGAG 
CTTGGGTGGTACCCAAATCTTCTTCCCCATGAAAGAACAGAGAATAGTTTAAATTAGAATCTTAGCTTNGGNTGCTAATGGTGGAG 
CTTGGGTGGTACCCAAATCTGCTTCCCCATGAAAGAACAGAGAATAGTTTAAATTAGAArCTTACTTTGAANNNCC 

360 1370 1380 1390 1400 1410 1 420 I 430 I 440 

CTTGGGTGGTACCCAAATCTGCTTCCCCATGAAAGAACAGAGAATAGTTTAAATTAGAATCTTAGCTTTGGGTGCTAATGGTGGAG 



Fig. 2. Three DNA sequences determined from: the HV1 PCR fragment cloned in M13 as clone JT3 (line 1), the HV1 PCR fragment sequenced 
directly using the H strand primer (the complement of this sequence is presented in Fig. 2 on line 3) and a clone of the HV1 PCR product in the 
opposite orientation of the JT3 clone (this sequence is presented in Fig. 2, line 11). These three sequences were used to generate a composite 
sequence of the HV1 fragment, shown in Fig. 2, line 12. The composite was compared with the Anderson DNA sequence (Fig. 2, line 14) [8] and 
showed six differences, denoted by the symbol *. No insertions or deletions were present in this HV1 sequence. A total of six bp differences are 
marked. See text for discussion. 
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3.2 DNA Molecules With Tag Sequences 

Figure 3 presents the DNA sequence designed to en- 
code the codons specifying the DNA molecular tag. The 
first triplet, AAC, encodes asparagine, which is abbrevi- 
ated N. The second triplet ATC encodes isoleucine, 
which is abbreviated I. The third triplet, TCG, encodes 
serine, which is abbreviated S. The remaining base 
triplets encode the amino acid written below the se- 
quence with the exception of TAA which encodes a stop 
codon, denoted by the * symbol in Fig. 3. This was used 
for punctuation of the molecular tag words. Note that 
subsequent asparagines are encoded by the triplet AAT 
The double stranded DNA molecule was constructed 
from five single stranded synthetic oligonucleotides as 
described in materials and methods. The assembled 
double stranded sequence was cloned into the Smal site 
of the Ml 3 mpl8 vector and clones with insert DNA 
were purified and characterized by DNA sequence anal- 
ysis using two DNA sequencing methods. 

Figure 4 shows an autoradiograph of the sequence of 
five clones with the sequence of the left most clone 
(KM4) written in the margin. The sequence noted starts 
at the boundary between the M13 cloning vector's Smal 
site (5'-CCCC-3') and the cloned insert DNA (5'- 
AACA-3')- The DNA sequence determination shows 
that all 108 bp are contained in this clone and are in the 
precise order shown in Fig. 3. No additional bp are 
apparent from this analysis. 

Figure 5 (top part) shows the DNA sequence of clone 
KM4 using the ABD Model 373A automated DNA 
sequencing instrument with fluorescent dye terminator 
chemistry. The sequence noted above each peak in the 
electropherogram represents the automated DNA base 
calling software version 2.0.1 used on the Applied 
Biosystems 373A. The transition between the Ml 3 
cloning vector DNA sequence (5'-CCCC-3') and the 
cloned insert DNA sequence (5'-AACA-3') occurs be- 
tween positions marked 43 and 44 in the figure. The 
insert DNA sequence continues to position 15 1 at which 
point the sequence returns to the vector sequence (at 
position 152). The 108 bp sequence is complete as de- 
termined using this method. 



The DNA sequence determined by both methods is in 
agreement with the sequence represented in Fig. 3. Se- 
quence of DNA clones with this insert in the opposite 
orientation (data not presented) are also consistent with 
the DNA sequence shown in Fig. 3. These three inde- 
pendent data sets demonstrate that the molecular DNA 
tag sequence has been accurately synthesized, cloned 
and can be accurately sequenced using standard manual 
and automated methods. 

3.3 DNA Molecules With Tag Sequence Errors 

Figure 5 (bottom half) shows the DNA sequence of 
clone KM8 using the fluorescent dye terminator chem- 
istry on the 373A instrument. The DNA sequence data 
demonstrates that a single bp has been deleted in this 
clone, corresponding to the G at position 137 in clone 
KM-4 shown in the upper electropherogram. This single 
bp deletion produces a frame shift mutation in the 
molecular tag sequence written above the DNA se- 
quence in the figure. The word "MATERIALS" is now 
altered to the nonsense word "MATNGSPSV". The 
single bp deletion probably results from an error in 
either the synthesis and assembly of the oligonucleotides 
into double stranded DNA or during the cloning process. 



4. Discussion 

We have designed, constructed and characterized by 
DNA sequence analysis a number of DNA molecules for 
use as reference materials. Two of these molecules have 
been selected as prototype standard reference materials 
to facilitate DNA sequence determination. 

DNA molecules that function as true reference se- 
quences can be constructed to encode any one of a 
number of different sequences. We chose to construct a 
clone encoding the human mitochondrial HV1 sequence 
because of the widespread use of this sequence in the 
identification of humans and human tissues. The HV1 
region is among the most variable in the mitochondrial 
genome, with one bp difference every one hundred 



NIST 1 I NIST 2 I NIST 3 

I I 

AACATCTCGACATAAGACAATGCATAATCGACTGCTAATGATGCrAGAGATTAAAGAGAGTTTGAAAGAGAAAATTGTGAATAAATGGCAACCGAAAGGATCGCCCTC 
T TGTAGAGCTGTATTCT GTTACGTA TTAGCTGACGATTACTACGATCTCT AATTTCT CTC A AACnTCTCT T T TAACACTTATTT ACCGTTGGCT TTCC TAGCGGGAG 



NIST 4 



NIST 5 



NIST*DNA*STANDARD*REFERENCE*MATERIAL 

Fig. 3. Nucleotide sequence of a synthetic double stranded DNA molecule designed to function as a molecular DNA tag. The upper strand has been 
translated using the universal genetic code with one letter abbreviations for the amino acids. The asterisks represent stop codons and are used to 
punctuate translated sequence. 
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Fig. 4. An autoradiographic image of a DNA sequencing gel. The DNA sequence is determined by ' 'reading' ' the length and position of each DNA 
molecule. The DNA sequence written on the side of the x-ray film provides orientation and interpretation of the film image. 
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nucleotides, on average [16]. The HV1 sequence was 
amplified by PCR, cloned in bacteriophage Ml 3 and the 
sequences of both the amplified DNA as well as the 
cloned sequences were determined. The sequence shows 
that in fact, the JT3 clone that we have constructed 
contains six bp differences with the published sequence 
over a sequence of 445 bp, as shown in Fig. 2. 

Reference molecules can be used as internal standards 
to measure and document the quality of DNA sequence 
determination. The JT3 molecule that contains the HV1 
cloned fragment should prove useful to the forensic ap- 
plications that analyze human mitochondrial DNA se- 
quences [16]. In addition, the JT3 molecule provides a 
useful DNA template for interlaboratory testing of the 
overall DNA sequencing process. 

One interesting problem presented by the use of re- 
combinant DNA molecules of the Ml 3 type described 
above for the human HV1 sequences is the problem of 
labeling or tagging the recombinant molecule as a NIST 
reference material. One solution to the tagging of DNA 
molecules is to construct a DNA sequence that can spell 
words that are easily recognized. 

A number of DNA molecular tag sequences were 
considered. Three components were considered impor- 
tant in the design. The first component was ease of 
identification and connecting the sequence to the Na- 
tional Institute of Standards and Technology. This was 
accomplished by creating a sequence that could be 
translated using the universal genetic code into protein 
sequence with the one letter abbreviation used for each 
amino acid. In addition, the DNA sequence coding for 
the translation stop codon (TAA) was used to separate 
and punctuate each word. The second design component 
important in the tag sequence was size of the DNA 
sequence. The DNA sequence needed to be large 
enough to be easily identified but small enough to be 
completely sequenced in one "pass" through the se- 
quence using current DNA sequencing methods. The 
current limit of "one pass" sequencing is about 450 
bases of sequence data. This limit is increasing because 
of longer gels, better DNA polymerases and improved 
software and should routinely approach 600 to 700 bases 
over the next several years. The third component was the 
simplicity of the sequence such that it could be synthe- 
sized, cloned and accurately sequenced. 

We designed a 108 bp DNA sequence that satisfies 
the three design components by using the universal ge- 
netic code with one letter amino acid abbreviations to 
spell the words "NIST DNA STANDARD REFER- 
ENCE MATERIAL" shown in Fig. 3. The DNA se- 
quence has a fairly random distribution of bases with no 
obvious secondary structural components. Five oligonu- 
cleotides were synthesized based on the sequence shown 



in Fig. 3 as described in materials and methods. The 
oligonucleotides were assembled, cloned in M13mpl8 
and characterized by manual and automated DNA se- 
quence analysis. 

The data presented in Fig. 4 was generated using the 
manual dideoxy method with the Sequenase DNA poly- 
merase. The data presented in Fig. 5 was generated 
using the dye terminator method with the Taq DNA 
polymerase in a cycle sequencing format on an ABI 
373. Both methods show that the precise 108 bp se- 
quence shown in Fig. 3 is present in the clone designated 
KM4. Characterization of an additional clone, KM8 
shown in Fig. 5 (lower half), demonstrated that a single 
bp (the G at 137 in KM4) has been deleted in this clone. 
The single bp deletion causes a frame shift mutation and 
produces an altered non-sense molecular tag sequence, 
MATNGSPSV instead of MATERIALS, as shown in 
Fig. 5. 

The molecular tag sequence used in this work was 
constructed using the twenty letters of the naturally oc- 
curring amino acids specified by the "universal" ge- 
netic code. This scheme has six letters missing from the 
English alphabet and has no numbers or punctuation 
other than stop codons which limit its utility in design- 
ing and constructing molecular tags. One could create a 
triplet code similar to the universal code that Nature 
evolved and eliminate the code degeneracy. An example 
of such a simple code would include all 26 letters of the 
English alphabet, the numbers through 9 as well as 
commonly used punctuation characters. Another alter- 
native code would be a tetranucleotide code using all 
four DNA bases to provide 256 unique codons. One 
could use 255 codons to specify the complete set of 
characters in the ISO Latin- 1 character set. These codes 
could be used to design and synthesize DNA oligonucle- 
otides that could be assembled into mini-genes and used 
by any organization to identify their products using 
molecular tags in conjunction with DNA sequence de- 
termination. Both of these coding schemes have the 
advantage of being easily recognized because that 
words are spelled in English and no central database is 
required to maintain the code. A more efficient code 
would be to eliminate the need to spell words and sim- 
ply use a digital code in the base four. Code lengths of 
30 bp could provide 4 30 possible tags. Such a coding 
scheme would require a standard form and databases as 
well. 

The ability to efficiently synthesize DNA oligomers 
in milli-molar quantities exists as well as the ability to 
detect just a few DNA molecules using the polymerase 
chain reaction. This provides the requisite technologies 
for a universal tagging scheme to identify many biolog- 
ical and nonbiological materials. 
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The DNA sequences described in this paper represent 
the first generation of standard reference DNA 
molecules that are designed, synthesized, constructed 
and demonstrated to have precise, well defined chemical 
composition and molecular properties. The NIST tagged 
DNA molecules provide the materials for constructing 
the next generations of DNA standard reference materi- 
als. 

A second generation of standard reference DNA 
molecules is under construction and has been designed 
with complex DNA sequences that function as test sub- 
strates for the different components in DNA sequence 
analysis. These components include DNA replication, 
enzymology, electrophoretic fragment separation, detec- 
tion, and software analysis. 

In summary, this paper presents and describes two 
new DNA molecules for use as Standard Reference Ma- 
terials that are designed to have specific chemical and 
biochemical properties and then synthesized, cloned, 
and characterized to have the correct DNA sequence. 
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